Automated Linking of Historical Data

نویسندگان

چکیده

The recent digitization of complete count census data is an extraordinary opportunity for social scientists to create large longitudinal datasets by linking individuals from one another or other sources the census. We evaluate different automated methods record linkage, performing a series comparisons across and against hand linking. have three main findings that lead us conclude perform well. First, number generate very low (less than 5 percent) false positive rates. trace out frontier illustrating trade-off between rate (true) match rate. Relative more conservative algorithms, humans tend link observations but at cost higher rates positives. Second, when human linkers algorithms use same variables, there relatively little disagreement them. Third, plausible analyses, coefficient estimates parameters interest are similar using linked samples based on each methods. provide code Stata commands implement various (JEL C81, C83, N01, N31, N32)

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linking Historical Data on the Web

Linked Data today available on the Web mostly represent snapshots at particular points in time. The temporal aspect of data is mostly taken into account only by adding and removing triples to keep datasets up-to-date, thus neglecting the importance to keep track of the evolution of data over time. To overcome this limitation, we introduce the LinkHisData framework to automatize the creation and...

متن کامل

Linking Individuals Across Historical Sources: a Fully Automated Approach∗

Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. We suggest a fully automated method for linking historical datasets that enables researchers to create samples that minimize type I (false positives) and type...

متن کامل

Using Historical Wafermap Data for Automated Yield Analysis

To be productive and profitable in a modern semiconductor fabrication environment, large amounts of manufacturing data must be collected, analyzed, and maintained. This includes data collected from in-line and off-line wafer inspection systems and from the process equipment itself. This data is increasingly being used to design new processes, control and maintain tools, and to provide the infor...

متن کامل

Linking Requirements and Design Data for Automated Functional Evaluation

This paper presents a methodology for automating the evaluation of complex hierarchical designs using black-box testing techniques. Based on an exploration model for design, this methodology generates evaluation tests using a novel semantic graph data model which captures the relationships between the related design and requirements data. Using these relationships, equivalent tests are generate...

متن کامل

The environmental-data automated track annotation (Env-DATA) system: linking animal tracks with environmental data

BACKGROUND The movement of animals is strongly influenced by external factors in their surrounding environment such as weather, habitat types, and human land use. With advances in positioning and sensor technologies, it is now possible to capture animal locations at high spatial and temporal granularities. Likewise, scientists have an increasing access to large volumes of environmental data. En...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Economic Literature

سال: 2021

ISSN: ['2328-8175', '0022-0515', '1547-1101']

DOI: https://doi.org/10.1257/jel.20201599